A Script Independent Technique for Extraction of Characters from Handwritten Word Images
نویسندگان
چکیده
A script independent character segmentation from word images technique has been reported here. Word to character segmentation is an important preprocessing step of optical character recognition process. But in case of handwritten text, presence of touching characters decreases the accuracy of the technique of the segmentation of the characters from the word. In this paper, segmentation of handwritten word of four different scripts namely, Bangla, Devanagri, Gurmukhi and Syloti are considered as the test samples. All these scripts are characterized by the presence of a distinct line along the top of the most of the characters forming the words, called the headline or Matra. Unlike English script, the characters of these handwritten scripts and its components often encircle the main character, making the conventional segmentation methodologies inapplicable. For the segmentation technique two fuzzy features, to identify the Matra region and potential segmentation point, are used here. Experimental results, using the proposed segmentation technique, on sample of 400 handwritten word images containing all the above mentioned scripts of Bangla, Devanagri, Gurmukhi and Syloti show a success rate of 95.41%, 93.61%, 91.23% and 92.37% respectively.
منابع مشابه
An improved offline handwritten character segmentation algorithm for Bangla script
Effective segmentation of offline handwritten word images of unconstrained handwritten Bangla script is a challenging problem in Optical Character Recognition (OCR) application. Presence of a continuous horizontal line called ‘Matra’ is an important feature of this script. However, in unconstrained cursive handwriting, Matra can be wavy or discontinuous, makes the problem of segmentation diffic...
متن کاملWord Extraction and Character Segmentation from Text Lines of Unconstrained Handwritten Bangla Document Images
In this paper, a novel approach for word extraction and character segmentation from the handwritten Bangla document images is reported. At first, a modified Run Length Smoothing Algorithm (RLSA), called Spiral Run Length Smearing Algorithm (SRLSA), is applied for the extraction of words from the text lines of unconstrained handwritten Bangla document images. This technique has helped to overcom...
متن کاملZernike Moment Feature Extraction for Handwritten Devanagari (Marathi) Compound Character Recognition
Compound character recognition of Devanagari script is one of the challenging tasks since the characters are complex in structure and can be modified by writing combination of two or more characters. These compound characters occurs 12 to 15% in the Devanagari Script. The moment based techniques are being successfully applied to several image processing problems and represents a fundamental too...
متن کاملStatistical Feature Extraction Methods for Isolated Handwritten Gurumukhi Script
In this Paper we have used moment based Statistical methods for Feature extraction like Zernike, Pseudo-Zernike methods. Handwritten Gurmukhi isolated characters are used for feature extraction. Our database consists of 75-115 samples of each of 40 characters of Gurmukhi script collected from different writers. These samples are preprocessed, normalized and scaled to 48*48 sizes. Feature extrac...
متن کاملExtraction of Characters and Modifiers from Handwritten Gujarati Words
The research activity related to Optical Character Recognition (OCR) for almost all Indian languages is very less. Gujarati script is one of the scripts for which very less literature is available, as far as OCR activities are concerned. This paper describes one of the important phase of OCR, segmentation of handwritten words into its basic components namely basic characters, conjunct character...
متن کامل